DOCOBEBT fiSSOlS 



BD f7S 612 

TITLE . 



PUB DATE 
MOI£. 



EDfiS -'PaiGE. 
DE^CBIPTOBS 




L 

Jchnsdn, Ei<;h*ard^ T. ; Thooas, Hayne P. ' * • , 

Ofeer Experiences in Implementing .the fiflc litle I. 
Evaluation Ht^dels.' , 
'Apr' 79 • ^. . ^ \ 

17 p.; Pap^r ptfesented at the Annual Meeting of the 
Aierican- Edtcaticnai (Research. Association (63rd, San 
Francisco, CA^ April 8-12, 1979^) \ 

- ' ' ' ' . 

Poststgefc. . • . 

Prcfclens; Coapensatory Education 
Proc€!&sing; Eleoentary Secondary* 
Education; ♦©valuation Nee4,s; *Guidelines; Models; 
♦ Program 'Adpinist;cation; Frograa pevdlopaent; JPrograa 
Effectiveness; «PpOgraiB Evaluation"; student Testing; 
Testing Problems; ♦Tei^ting Pr'ograas . 
Eleae-ntary Seccndary Education Act Title I;' f . ■ 

♦Evaluation Prcbleas; ♦RHG Hodels ' ' • 



MF01/PC01 Plus 
Adainisttaiive 
Prograis; Data 



AflSTSACT 



The experiences of «tate. and local education agencies 
in iaplejBentijQg the BMC evaluaticn Joodels in .their eva^^uatioHs of 
Eleaentarjf Secondafy Education Act Title I prograas are discussed 
wi^^h eaphasis on the^robleps enccfuntered, suggesticns fbi resolving 
thes6 probieas, and encouraging r/^snlts .which have- ^een reported- 
Poijr activities are described' as ccaacn for a^l "of the agencies which 
have bebn using the' fiflC aodels to aeasure achieveaent gains: test^ 
selectipn and adainistration; scoring and converslcn^f scores; data 
analysis; and data aggtegation- Prohleas whicli have b^en encountered 

/n iB4)lea eating t^e RIK: models are classified as (1) ^. - 
procedu3;a'l-rs^lej£tion of student Samples, tests, and fiflc aodel; and 
coaaunication, of\^esults; (2r clerical — conversion of raw scores, 
coaparison of pretest and^post -test scored, and failure to record 
testing date; and (3J analytical^-errdrs in data analysis. A total of 
27 guidelines are suggested which should be helpAl in reducing these 

• probleas, 'Several positive coaaents regarding the usefulness and 
potentia*l of thfe RHC' models are included. (GSC) ' ' . 



♦ \J{^eprodoctioiis supplied by EDES are the best that can be aade * 

'froi the original document. * 



^ A*^'^* NT Hi At TW. 
SOUCATlWtWtLP^AHC 
NATIONAL ll»tTITUTBO#? 
COUCATflON 

THIS DOCUMENT HAS feEFKi RPPon 

STATED f^^ol' °« OPINIONS 

^rtr Jii^^ '^^^ NECESSARILY RfPR^. 
SCNT0P§IClALJ^.TiOJ^t INSTITUTE cfp 



USER EXPERIENCES IN UJPLEMENTING 



THE^RMC /TITLE I EVALUATION MODELS 



, % 



Richard T. Johnson and i?ayne P. thotaas 
Virginia polytechnic Institute and State "Uni varsity 



April. 1979 



■PEPlMlSSION TO REPRODUCE THIS 
MATfiRIAL HAS BEEN GRANTED BY 

TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



Paper presented at the 

' \ 

Meeting of the American Educational Research Association 



^ San Francisco, dalifomia'j April 7-13, 1979 
- sisslon 24.24 



Many school systems in the country have made progress in testing- and 
implementing, the ESEA Titl^ k Evaluation and Reporting System.^ During the 
1976-77 school year, 28 of the^EA^s across the nation had'L]EA's , using one 
or more of the proposed models to evaluate the reading and/or math outcomes 
attributable to participation in a Title I project. Most of these LEA'-s ' 
were using the techniques for the first time. Information gleaned* frcm the- 
evaluatlon personnel i,n these LEA^s gives an idea of the administrative pro- 
cedures Involved in effecting t^e change to -new methods. The Title I staffs 
iti the $ta^es with the longest and most comprehensive experience in using 
the new procedyres (Florida, Iowa, Maine, Ohio, Sputh Carolina, .and South 
Dakota) have shared the results of th^ir experiences in iinplementing the' 
Title I evaluation models and liave donte so^ in particularly helpful ways. 
State Title I officials and the local staffs of more than 20 school divi-* 
sions in* the CSmroonwealth of Virginia have also been very cooperative in / 
sharing their" experiences' in implemfenting the models. ^ ^ 

The legal mandate for this "field test^' and-«^cmjentation effort Is 
in Subsection F of Section 151 of ESEA^ Ti^e^, which states that the ^ 
'Commissioner of Education must require SEA^s and LEA^s to use techniques^ . . 
and methodology ... for producing deta which ar-e comparable pn a statewide 
and nationwide basis." The development and publication- of valid evaluation 
tnodels as required by Section 151 is the first stsep toward the nationwide 
production of comparable d^a; subsequent necessary steps are ' the monitoring 
of the use of the mqdels and the develojiment of means to make their use as 
error-free as possible. ^ • 

The purp6se of this paper is to ^escritfe the problems expJerienced by- 
state , and local personnel who used the evaluation models to assess project^ 
effects, to suggest ways in which these problems can be solved, and to 
mention some vei|y encouraging results that have come about .through the use 
of ■ the modets. > . . _ i 



There are many ways in which LEA's and SEA's have gathered, analyzed, 

aggregated and reported Title I evaluation information. Sona states cui:r«nt- 

ly gather data at the individual student level from th« LEA^s and then • 

# 

employ central processing; othef ^tates rely entirely on their LJlA's to 
analyze their own data. Some states have automated virtually the entire 
evaluation process, using s?coring seirvices and automatic data processing*; 
others rely bn information that is scored and analyzed entirely by hand. 
In ^states that have both very large and very smaXl LEA's, differences in 
procedures vary tremendously even within a state. .V 

There are many reasons for this wide variety of practice across the 
national Title I system. ' Capabilities and support systi^s^ vary widely. 
Different philosophies and priorities are used to 'set policies. The 
number of^ Title I students in a district may range from the tens to the 
tens of thousands, and the am6unts of Title I - grants ^might range/f rem the 
tens of thousandsito the tens of millipns'^r dollars. Thus, it is difficult 
to ^nvision a urlique correct way to jio things; the System must be designed 
to function efficlently'with alternative methods for^dSta handling, pro- 
cessing, analyzing, and aggregating. 

• Although the evaluation methods and'^^^apabili ties of the states .and local 
ities that have implemented the RMC Title I ev^iiiatipn models vary widely, it 
appears that the 'flow of achievement data through the evaluation systems in- 
volves certain* tasks whieh are common to "all states and localities. These 
tasks are(l) the selection and administration of tests, (2) thS scoring of , ' 
instruments and conversion of scores, (3) the analysis of data, and (A) the 
^gg^^gation of data. * , m 

Selection/administration of tests ; At this initial p^ase of an 
evaluation, several steps are important:^ <1) the proper test must be 
selected; (2) the correct level of it must be administered; (3) the test ^ 
administration^ procedures must !re standard; and (A) the testing conditionsii 
must he apptop^tiate.- 

The /'proper" test Is, foremost, one which measures wha,t is being taught,^ 
Much research (Annbruster , Stevens, and Rosenshine, 1977; Bianchi-ni, 19?6; 
Hoepfner, 1976; Stearnes, ' 1977; and Tallmadge, 1977) has highlighted* the 
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degree to which d If ferent standardized tests emphasise different subskilXs 
within a skill area. Title I evaluators are advised th^t thq more closely 
their tes^s correspond to the project objectives, the more relevant the 
scores will be for detecting student growth in the project (Fagan and Hotst, ^ 
1978). This fairly common-sense notion is often disregarded as o<hter 
factors 'influence test s'election. , ' ^ 

The "proper*' test has empirical normative data on children similar to • 
the children in the project and gathered at dates in the' school year cor- 
responding to the pre and post-test dates for evaluation using model A, Also, 
the norm data are from a representative national or local sample of- children. 

Using 'thi correct level of the test means administering onfe on which the^i- 
fewest chi^ldren possible score either at the "chance" level or at the top' 
score. This is important because a preponderance of the former, placing many ^ 
students at the "floor" of' the-test, artificially inflates the group's pre-- 
test average, thereby overstating their status before the proj ect^. ;An 
estimate of their gain- due to the project would then be underestimated. 
Similarly, if -the students "top out" on the post-test, the group's status 
after the project* is under^estimated, and the resulting gain figure is again 
too small* Of course^ the mismatch of test level wi^th student skill levels 
can also affect evaluation results in"otl;ier ways/ the important consideration 
is that students' performance levels be reflected as accurately as possible. 
Use of the wrong level of the test precludes this (Roberts, A.O.H. 1978; - 
Roberts, S. , 1978) . 

All tests have standardized procedures outlined for their use. Even "home-- 
made" instruments have instructions sfor administering them. Such procedures 
may include timirig, use of practice items, degree of assistance from the test 
proctors, etc. Furtheffeore, the testing conditions must be good. For Example, 
the jooms should be quiet; the settings and times for testing project and 
"comparison" group children should be similar.' (Horst, 1978; Tallmadge and 
Roberts, 1978). In order for students' test scores to be comparable to, those 
of others, especially to the^onn data, the outlined procedures roust be fol-s. 
loved (Horsr, 1978). ' . ' ' • 
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^ Scoring of InstrumeittsVConvers'loii of scores ^ This sfep involves deter- 
mining the test score for each snsdent. OcGasiofcally,it Is done by hand 
but, ffjore often, a scoring service is used. Student answer sheets are sent ' 
away to"a firm which will return lists of students tested, the raw sqore for 
each, aad r^jCiested score conversions. However, problems may arise with lost 
answer sh^ts, incorrectly coded ones, damaged pages, or errors in the con- 
version «f scores. % ' * 

to the mathematics required bij the analytical procedures in an 
evaluation, students* raw scores must be converted to a Standard score metric. 
. In most cases, the preferable standard score metric f^r the computations is 
one. which incorporates characteristics of the national distribution of scores 
for the age group — the normal curve equivalent (NCE) . In order to derive that 
"figure, as many as three or ^f our separate conversiocfe may be necessary for' 
each student *s score, A typical sequence would be to convert the ctiild's 
raw score to the publisher's standard score, t;hen to a national .rank or per- 
centile^ and finally to /n'NCC, / 

Some scoring services can provide all of these scores; 'all can provide 
the percentile equivalents. Typically, there is a bharge for each additional 
score^equested, so the use of a service for all of the conversions would be 
expensive for large projects. Scores from sc<^ing services are usually more 
accurate than scores that have been manually tallied and converted, however. 

Data analysis. This is the phase in the Title I evaluation where data 
:from individuals are combined Into^project-level statistics. NCE gains, 
describing the ef f ectiveness 9pf the 'Title I project txi contributing to the 
"^students* learning above and beyond what is expected nrom the "regular" cur- 

r^culum, are calculated. Each model prescribes the appropriate analytical * ' 
! techniques, which range from fair 15^ stiraight-f^irward computations in the case 
of Model A to complex statistical manipulations .i-n Models B and C 

Data aggregation . "This is the final .step in the Title I evaluation system 
prior to the actual reporting of the evaluation results. Errors made at this 
stage of. the evaluation are not likely to be serious, since the data on which 



the aggregations are based.^h'e project-level NCE gains, are usually acces- 
sible for later double-checking. This phasp usually takes place at the*SEA 
•level. ' 

PROBLEMS E^OUNTERED IN IMPLEMENTING THE MODELS "* 

In using the new models, SEA's and LEA's h^d problems and made mistakes 
•tn three area's: procedural , referring to adherence to suggested rules of ^he 
models; clerical , referring to recorcHng, translating, ^nd calculating; and 
analytical , referring to technical and-statistical problems. 

Procedural . The first procedural problem ;is in the selection of Title I 
students. The Requirement of Model A that selection of Title 1 participants 
be ba^ed on data"bther than from the pretest is ^he Requirement that has been 
most often ignored. Even ^mong the states which were the first to attempt " 
,the models,^ states with extremely competent Title I personnel, the proportion 
of LEA's still selecting on the pretest exceeded half of those reporting. 

SEA personnel were b'eset with questions from the LEA's on how to follow' 
the model's- rules without deviating from present practices or perceived 
requirements. Various LEA and S'EA personnel advised their LEA's of methods . 
in which to choose students without invalidating the' evaluations, but the LSi^l's 
seldom tried She advised methods or applied th'e methods validly. For example, 
one evaluator suggested choos'ing students based on a combination of standard- 
ized achievement test scores, absenteeism, a teacher *s estimate of the student's 
achievemeI\^, an estimate of underachievement, and an estimate of motivation 
and health. When the reports were returned, it was discovered that most school 
districts had used the pretest standardized test score combined with an estimate 
of underachievemerit whicK had been obtained by the subtraction of the student's^ 
pretest score from thfe class average. Thus, only a single criterion was used 
anjj the choice of individuals to be placed in the Ti£le I group was b^ed 
solely- oil ,the pretest. - " 

The second most coimnon ^procedural problem lies dn the administration of 
norm^-referenced t^sts at the proper time in Model Ar Comparison^ between the 
fitle I group and the publisher's norms, are most valid, when based on 

real data points, so the. tests should be given during a four or six-weeks 
period spanning the test pjablisher's main norming date. This was frequently 
not done 'by the LEA's that we studied. LEA personnel generally wish to test 



ERIC 



i \ 



as early in th"e school yea? as possible (e.g., in the middle of September) . 
However, the midpoint of a publisher's test norms may not occur until 
November . ' 

Another procedural problem is that of communicating to local admin- 
istrators the meaning ^of the new NCE metric. According to a number of state 
leaders, pai»ents accept^ the idea with little hesitation, but administrators, 
especially superintendents, resist strongly. The use of a variety of metrics 
for^-sharinfe results with interested parties will help alleviate this j^roblem. 

A final procedural matter is the 'fact that administrators consider local . 
f4nding allocations for testing when they choose Model A, B, or C. The require- 
ments of Model C include the %e»^ting of comparisojn students from among those 
not in the project. Therefore, if the district already has budgeted funds 
for testing once a year, and selection of Title I ^^tudents on that test is' ^ 
acceptable, then Model C is a logical choice. But if the money for testing' 
has not been allocated, the choice- can just as logically be Model A. 

Clerical . The translation of a raw score into any other score is fraught* 
with error. In one state, data from more than 93% of the LEA's contained &t 
least one table look-up error. Another state director reported that t,he ma- 
■^jority of errors in his workshop -exercises stemmed from the inappropriate use 
of the same norm table for both pretest and pbst'test score conversions in 
spite of very obvious table titles. Where yet another had modified the for- 
mat of the publishers* nqnu tables, the error r^tes were considerably reduced — 
but still excessive. 

Another problem is that often the gain score for an individual turns out 
negative, and negative numbers appear to be an anathema to proper calculations. 
One evaiuator was so upset by negative scores th4t^ he ignored every one in 
averaging gains in his project. 

The assignjnent of a pretest or posttest score to every individual is only 
part of the process— the tvo tests must be matched fo^^^ach individual. In 
large "districts the matching is generally done on a computer. Here Several 
difficulties appear: a matching program must be written; bad coding or 
punching t^kes a toll of properly matched individuals; a single unmatched' 
card in a sorted file makes the entire remaining, ones mismatched and makes 
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-the evaluation worthless if lefl undiscovered; students change the spelling • 
of their names^ and inconsistencies appear in using the last name first. 

In smaller districts, *the match is carried out by hand'. . Here the 
human intelligence can remedy many of the problems: one can see that 
"Bill 'rtiorne" 6n the pretest is "Thdrne, W." on the posjt, for example. A 
knowledgeable individual might ronember that ^ school boundary was recent- 
ly changed, and haL^the students are in the neighboring school, or that the 
Jones children oscillate regularly between two schools. Thus increased 
difficulty in matching by hand is offset by the increased frequency of 
matches. » v 

y A comparatively minor error in reporting is the failure to include'^eft, 

day of the month on which testing was accomplished. Under Model A, a user of 
the ITBS would be expected to test within a two-week intetval on" either side 
of ^April 28th. If he merely repbr ted that the test was administered inApril, 
the state evaluatoi* cannot assume that the t^st was given at the proper time; 
it could have been on the first of April. The error is minor in that it c^ 
be .corrected easily with a change in the' reporting requirements.^ 

Analytical . A variety of technical questions dealing with the statis- 
tical and psychometric aspects of the system continue to plague the evaluators. 
First, are evaluators jeopardizing the accuracy of evaluation results by test- 
ing once a year in the spring? If students forget a great deal during, the 
summer, perhaps they would show greater gains if they were tested in both fall 
and spring. Second, when students rep^eat a grade, what pretest score should 
be used — the first pretest or the second? Wliat norms should be used for 
sUch students? 

Some ^tste personnel note that the correction of an LEA's evaluation 
error may result in a lowered gain estimate. They recognize, however, that 
no one wants to be fooled- into assuming successes in remediating children's 
educational problems if the renjediation has not occurred. 

^^^^^^^e sample of Individual project reports perused in the states were 
rated' on th^ir quality of evaluation. Those projects with the "be/t'' eval- 
uation showed low but 'positive gain scores^for the^Title I gr^p. • (The ■ 
correlation between evaluation quality and size of gain was -.25.) Those 
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projects which appeared to ^be evaluated correctjly showed a tnodest. positive 
impact from the program.^ ^ 

SUGGESTIONa FOR IMPROVEMENTS " , ^ 

When the directors and evaluators in the SEA's and LEA^s identified the 
prQbXems outlined above ^ they aiso suggested a varlety'of solutions. Some . 
approaches had already been tried out, and other possibilities arose in dis-- 
cussions. . * . ' 

People we have worked vith an* other interested parties have suggested 
ways USOE can help SEA^s and LEA^s follow more completely the procedures out- 
lined in the evaluation and reporting systeui. Their suggestions are 
sunmarized below according to the same categories used in preceding sections: 
procedural, clejrical, analytical. 

Procedural . Most of these apply to the general implementation rules 
and other administrative areas. ' . * . 

1. ^ Provide a detailed special handbook on the implementation of each 
, model.. The handbook should be very elementary, in step-by-step flow chart 

fashion, with plenty of concrete examples of documents and approaches which 
•have worked. - 

2. Emphasize reduced testing requifements with the proposed models. Too 
many school personnel are ^too worried about too many tests. Specify, for each 
model, the minimum testing possible. 

3. Tq encourage the proper administration of tests, encourage those 
districts using once-a-year spring testirjg to have the teacher from the next 
higher grade give the tefet. To the third grade teacher testing the second 
grade students at the end of the ..year, accuracy would be paramount since nex^t 
year he would have those very students and would, supposedly, welcome accurate, 
•test scores in their folders. 

4. Give more guidance regarding test selection. Many studies have 
dera^onstrated the importance of test content for detecting student growth in . 
specific skill areas.- Certain tests may be more sensitive than others to the 

^'skills oontent of many Title I programs. 

5. Add more ififormation to the handbook on out--of-level testing^. Jfiost 
individuals still feel very uncomfortable attenpting to implement functl^onal- 
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level lasting although they recognize the necessity- with some students. Pre- " 
pare a detailed Aecklist which states can give to LEA 's, showing the effort 
involved in ordering, the logistic^ of testing, and the scoring and translat- 
ing of scores. Discuss the pay-offsi#or these extra steps in. terms of an 
increase in^ceiwacy of the scores. Point out and discuss common misconcep- 
tions, such as the beliefs that funqtlonal-level testing will result in the 
pretest average being lower, that'll will result in choosing the wrong students, 
tha^ it will give inaccurate lestiinates of gain, or that it will compare stu-' 
depts at one grade unfairly with those at the next lower grade. 

6. Provide guidelines on what action to take if the raw score from 
an out-of--level test administration le<ads to a converted score too low to 
be included in the percenti>le ^conversion table. 

7. ^ Conmiunicate the results of the data collected in this study to pub- 
lishers, especially the information about the nee*lfs for norming' earlier in the 
yeair, and for less ponf using norms tables 

8. Kemoye the suggestion in current documentation that two-^thii^s of . 
the project should take place between the pre and- post-tests* The incidence 

of failure to follow this requirement is negligible, and should diminish to zero 
as districts move to appropriate testing dates • The requirement leads .to re- 
porting of non- informative data and to problems when schcwls have provision - 
for students' return to their . regular classrooms af ter they have mastered a 
cert^iti body of matferial. 

9. Investigate the conditions under which combining resutfes from * 
different grades is appropriate. If a comparison group for Model C is too 
small within aparticular grade, some addition* of non-Title. I students from 
the next^ higher grade might be possible although technically they are not in 
the project. In Model A, adding the two Title I students in 8th grade to 
those in 7th might reduce the trauma attendant to finding an average of a 10 
NCE loss (due to small size and unstable data), 

10. In the^ light of the ^igh jrate of errors in table readtng, consider 
the use of a raw score reporting system. !ti the absence -of mechanical aids 
to table iook-up, provide simpler score conversion tables to users atid 
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communicate the problem to test publishers. Some publishers have alread^ 
provided vastly improved tables for those States in which the evaluator h'as 
insisted upon them. , 

\l. Provide assistance, both in guidelines Ih^d in a computer program,- 
for the plotting of pretest against posttest for all\odels. The visual in- 
spection of the scatterplots could be expectecf to indicate floor and ceiling 
effects, gross non-lineapity of relationships, and the presence'^of un'expect- 
edly high or low ^'gainers". • 

12, Classify evaluation errors by severity so Chat when two implementa- 
tion suggestions conflict, the Title I co-ordinator^can have some guidance ^ - 
in making a ^choice between the two. - ^ f • 

13r At present, the federal government publishes excellent technical 
reports on the RMC models^ but SEA^s and 'LEA *s often do not obtain the re- 
ports because they are expensive or becaus^ tha-SEA's and LEA's di) not know 
that the repcrts exist • • ^ ^ 

Make thase^eports available to SEA's at no cost. ^ The SEA's could then 
distribute them to LEA^s. 

14. Show some examples of how program objectives could be stated under 
the new reportl^ng format. ^ / 

For example, one state requests tha^ the LEA Title I director estimate 
the NCE gain to be achieved for each grade within each project. If the third 
grade reading project at Memorial Elementary has a reputation as the best, 
tfhen a seven NCE gain may be the objective. If *it is the' worst, then orte--half 
an NCE might be- appropriate. 

' 15. Prepare exg^ples to help SEA^s and LEA*s cdmmunicate evaluation re- 
sults, (including use of NCE*s and percentiles) to Title I parents, school 
.boards, atid teachers and administrators. 

16 • Consider alternative methods of tying a cost figure to a project. 
Since districts generally spend about 75% of their ^budget for instructional 
personnel direct costs, the reporting of only those costs may incr^ease the 
accuracy and decrease the reporting burden of the cost estimate. 



. , Clerical, ^ Cl^ritial s^g^S^tlbns • refer to rec^rdiijg, t^hslating, 'aiia*-' 
caldulacing ptocfesses* * ' / * ' . ■ • • / , . \- * 

Instruct LEA evalfiators,^ ^when looking ^ up- average, standardized scores 
in publishers' nofm tables, tq 'use the individual percentile norms ^ tab le^ not * 
the school' norms/ table . ' . 

2. Remove any requirements for data point interpolation at the district, 
level. Intexpolation appears to »^e more error-prone than it is wortli. 

3,. Send copies of an exempla^ testing-dates chart to all interested 
parties. For example, the forms used in onfe state include a chgrt which ^. 

sts district personnel in avoiding test adinini4|^ation date errors. 
' 4. Add a requirement for the project report to' provide the average' *■ 
selection NCE, where possible, and the average pretest NCE. This will allow 
an easy edit check to se^ if the selection was based completely on the pre- * 
test, and, if the most needy students were chosen. 

; 5. Revise the percentile- to-NCE conversion tables so that they wil^ be 
easier Perhaps they could be placed in groups of ten, with clear lines 

to demarcafefe ' the columns, - 

6. Develop optical scanning forms'and software for the'analy&is of 
scores from major tests. If* a state, or large LEA, decides to centralize the 
scoring process rather than have the local classroom teachers score theiir' own 
papers, this is the solution with the greatest long-range potential, though 
perhaps the highest initial expense. Furthermore, although the accuracy and' 
speed *of data prooessing will be vastly improved, the system must also have 
a set of appropriate error-detection methods built in. . 
• • 7, Encourage LEA's to score some tests by hand, even if they have en- 
gaged a scori|i^ service, in ordef to check the accuracy of the-^^rv*ce. 

8. Provide guidance regarding the use of automated da ^ processing as 
often as possible for the various score conversions and manipulations necessary 
in the system. Encourage other approaches, too, to preserve the integirity of 
the data: 

(a) Staff should try to perform score conversion activities in teams, 
with people double-checking the work of others whenever possible; 

(b) Raw data should be stored (or sent to the LEA or SEA) to enable 
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later checking of a data sample fpr the correctness of- the tables used, 

score c on V efts ions,. eta%'; ^ v . ' ^ ' ' , . 

(c) .Ti^ibles Should bfe re-f orm&tted ,^t tha local leyel to allow for 
easifer reading (e.g., 2-co,luBdti tables, are much eqsier to work xHth ti^pf 
'multl-^Qolumn tables)*; and, ^^! ' / " . 

(d) Staff ^responsibly for performing the score conversions should be 
trained tn the use and interpretation o'f standardized tefets, ^e^hat they 
will understand the meaning of the scores an4, will be able to detect obvious- 
ly wrong, inappropriate, or put-of^range scores. 

• A pre-pjogrammed calAiJator or computer c#n also facilitate the data 
analysis-. For small LSA's and SEA's without access to larger computer 
facilities,, a set of programs and implemfentatfon materia:ls is being developed 
for user with hand-^held programmable 'calculators available from $89 and up. 
These programs pan cejrtainly* aid the evaluator but may hot offer a complete 
set of diagnostic feature§ (e.g., a plot of pretest against posttest scores) ' 
of the types which have been suggested in this paper. Properly designed micro- 
processor-based software appears %o offer more acceptable •axid^qost'-effective 
provisions for the extensive amount of data €*eckinf and editing which should^ 
be done by the evaluator. ' ^ , * ^ 

Analytical. The suggestions in this section i^efer to technical character- 
istics of the models. • ' ' V ' ^ . • • 

1. Clarify the^ severity o£ the regression hazard when^ two-stage selection 
takes place. For instance, a potential Title I treatment group of 100 studeiits 
may be identified by' teacher referral. Then a pretest is given to tho&e 100, 
and the 95 students mpst in need are given treatment. The regression 

effect is consideifably less than if the fivers tudents mogt in need were chos'en. ► 

2. Investigate ftfrther the trade-off in comparing project effects from \ 
spring to spring versus fall to spring. Once-a-year testing is much easier 

on the budget and school time, but this may be of fspt by the loss o-f students 
between school years and the students^ loss of knowledge over the. summer . 

3. Assist evaluators in the objective iden.tif icatidxv of "outliers*' 
(student data so extreme that they are. likely in error). Outliers can ♦ 

. y ■■ ■ 



significantly affect estimates of project effect- especially when-Model C is fised. 
For instance, *in plotting 4, Model C implementation, o6e distr-icrt ©valuator 

was surprised to find tvo comparison group individuals wj^o were at the top of , ' 
the distribution on. the pretest and almost at the hot ton on the posttest. 
Examinatijin of their scores revealed that they had scored ar the 99th per- 
centile on the reading compreBensfon pretest^jji thout missing a single item. 
In contrast, their vocabulary scores were at th^ 1st percentile. Clearly 
some error had been mad#^(by the scoring seryice, perhaps), and the individuals' 
scores ware dropped from th^ analysis. / The result was 'that the estimate of project 

effects /changed from negative to positive.. 

CONCLUSIONS • ' 

. ' If 

It is apparent that the process to change LEA evaluation activities to 
conform with. those prescribed by the evaluation models is laborious and, by 
necessity, iterative, -States we visited had staff pursuing this goal for as 
long as two years, and iftariy reported that more work is still needed. 

Probably the most pervas'iye administrative -probleiB for SEA's is that 
they db nqt kno>^ and .therefore cannot review what actually happens* in the 
lea's* Of course, thls^ is much more than just^-an evaluation problem, but It 
greatly affects evaluation data. A test may have been given on* other than the 
reported dates (In one of the states, the test was supposedly administered on 
a Sunday); the test ^administrator may have ignored the directions for proper - 
test giving; tests designed -for group 'administration may have beengi^ven In- 
dividually or with relaxed time limits; tests may have been administered to 
the wrong jlndividuals. ^ 

Reliance on information that is scored, transformed and analyzed manuall?^ * 
appears to be a Aajor threat to the valddity of reported data. Though conversion 
to autoipatic data processing wherever posaible in the evaluation system does not 
4>4'omise to be a total panacea, it seems a promising first step. 

Although the problems ir^ instituting the KMC laodels 'are legion, .bot^ 
state and local district personnel ^haveUound^the models to be extremely useful. 
The models allow state personnel to compaH' data across districts. Because of 
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this, soxae states ar^ already using the evaluation data to* identify especially 
effective programs in specific LM's and most ,sta fees plan ' to'* start using the 
data in this way. Other states Have. already sponsored 'or encourdge^ investi- 
gations to determine whicl^ Title I program features appear, to be positive^ly ' 
correlated with achievement gains. 

Another advantage to these models is the existence of standards which • 
states can use for advising and monitoring their LEA's evaluation activities 
Some SEA^s believe that the lack of such information historically left them 
with little basis fqr insisting upbn specific LEA evaluation practices. 

State personnel .have also noted the beriefifs of §^eater attention to 
achievement tests — their content, use, selection^ etc. For example, ^he fact 
that t;he models recommend specific procedures appropriate to the test being 
used has prompted evaluato^s to look more deeply Into the \charac teristics' of 
their tests. * * ^ . 

^ Local personnel welcome the possibility of comparing the outcomes of 
their efforts to those of districts they know t<b be simil^ar. In addition^ 
. LEA^s have indicated that the explicit recommenced procedures of the new 
models are less burdensome than the former federal mandates, which were 
/^unclear. . ^ - ^ 

In summary, we believe that the problems encountered in instituting the . 
RMC-«odels can be solved. We submit that the advantages of using the models 
make the effprts 6n the part of LEA'§ and SEA's to accomodate themselves to the 
necessary changes well worth-while* 
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